[fix](scan) Respect byte budget when merging scan blocks#63296
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Reviewed the full PR diff and existing review context. No existing inline comments were present, and there was no additional user focus.
Critical checkpoints: correctness has a blocking issue because disabled adaptive batch sizing now still enforces a 512MB merge cap; compatibility/regression risk is therefore not acceptable until fixed. I did not identify concurrency, memory-accounting, security, or API-surface issues beyond this behavior regression. Tests/build were not run by me in this review environment; the PR description reports object compilation only.
|
/review |
2 similar comments
|
/review |
|
/review |
There was a problem hiding this comment.
Automated review summary for head 4916f6f:
Overall opinion: approve / no new blocking issues found in this review pass.
Critical checkpoints:
- Correctness: The scanner merge path still enforces the row budget, adds the adaptive byte-budget check only when adaptive batch sizing is enabled, and preserves the empty-block merge behavior. The previously reported disabled-adaptive regression is not repeated here because the current head explicitly returns row-only behavior when config::enable_adaptive_batch_size is false.
- Compatibility / behavior regression: I did not find an additional compatibility regression in the one-file diff. Disabled adaptive mode matches the old row-budget-only behavior; enabled adaptive mode now prevents merged cached blocks from exceeding preferred_block_size_bytes by data bytes.
- Resource accounting / memory: The change continues to use allocated_bytes() for block usage accounting and bytes() for adaptive data-size budgeting, which is consistent with the PR intent. No additional accounting bug was identified.
- Tests / validation: I did not run the full BE unit test suite in this review environment. I did run git diff --check for the PR diff, which passed. The PR description reports object-level compilation validation.
- Existing review context: The existing inline thread in be/src/exec/scan/scanner_scheduler.cpp about adaptive-disabled behavior was treated as already-known and was not duplicated; the latest commit appears to address it.
- User focus points: review_focus.txt says there are no additional user-provided focus points, so no focus-specific issues were found.
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
Summary
Fix scanner scheduler block merging so the adaptive batch size byte budget is respected when multiple scanned blocks are stitched into a cached block.
Root Cause
The scheduler merge path only checked the row count against
batch_size(). When adaptive batch size produced multiple blocks that were individually acceptable, the scheduler could still merge them into a much larger block because it ignoredpreferred_block_size_bytes().Changes
preferred_block_size_bytes()for the scan task.allocated_bytes()for memory accounting while usingbytes()for the adaptive data-size budget.Validation
git diff --check -- be/src/exec/scan/scanner_scheduler.cppninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/scan/scanner_scheduler.cpp.oNote:
./run-be-ut.sh --run --filter=ScannerContextTest.*was started earlier but stopped after it triggered a broad ASAN UT build; the changed object had already compiled successfully.